Panda: A System for Provenance and Data

نویسندگان

  • Robert Ikeda
  • Jennifer Widom
چکیده

Panda (for Provenance and Data) is a new project whose goal is to develop a general-purpose system that unifies concepts from existing provenance systems and overcomes some limitations in them. Panda is designed for “data-oriented workflows,” fully integrating data-based and process-based provenance. Panda’s provenance model will support a full range from fine-grained to coarse-grained provenance. Panda will provide a set of built-in operators for exploiting provenance after it has been captured, and an ad-hoc query language over provenance together with data. The processing nodes in Panda’s workflows can vary from well-understood relational transformations, to “semi-opaque” transformations with a few known properties, to fully-opaque “black boxes.” A theme in Panda is to take advantage of transformation knowledge when present, but to degrade gracefully when less information is available. Panda yields interesting optimization problems, including data caching decisions and eager vs. lazy provenance capture. This paper is largely an overview of motivation and plans for the project, with some material on current progress and results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Provenance, Tectonic Setting & Geochemical Maturity of The Early Miocene Pyawbwe Formation, Sakangyi –Thayet Area, Magway Region, Myanmar.

Abstract The best exposed Early Miocene (820 m. thick. ) shales and interbedded silty sandstones beds of the Pyawbwe Formation at Sakangyi- Thayat area,Magway Region are investigated geochemically by using Siemens SRS- X Ray 303 AS XRF Spectrometer. Major and some trace element concentrations have been determined to achieve their provenance, tectonic setting ,paleoweathering , paleoclimate and ...

متن کامل

A Distributed Provenance Aware Storage System

The provenance of a file represents the origin and history of the file data. A Distributed Provenance Aware Storage System (DPASS) tracks the provenance of files in a distributed file system. The provenance information can be used to identify potential dependencies between files in a filesystem. Some applications of provenance tracking include (i) tracking the transformations applied to process...

متن کامل

History and Provenance of the “Chinese” Calendar in the Z?j-i ?lkh?n?

This article sheds light upon a “Chinese” calendar described in the Z?j-i ?lkh?n?. In previous studies, some characteristics of the calendar were ascribed to the “Uighurs” However, I will show that it was not originally associated with the Uighur. This “Chinese” calendar was brought to Iran by the Chinese Taoist Fu Mengchi who accompanied his ruler Hülegü. Fu Mengchi informed Nas?r al-D?n T?s? ...

متن کامل

Language-integrated Provenance in Links

Today’s programming languages provide no support for data provenance. In a world that increasingly relies on data, we need provenance to judge the reliability of data and therefore should aim for making it easily accessible to programmers. We report our work in progress on an extension to the Links programming language that builds on its support for language-integrated query to support where-pr...

متن کامل

Trustworthy Whole-System Provenance for the Linux Kernel

In a provenance-aware system, mechanisms gather and report metadata that describes the history of each object being processed on the system, allowing users to understand how data objects came to exist in their present state. However, while past work has demonstrated the usefulness of provenance, less attention has been given to securing provenance-aware systems. Provenance itself is a ripe atta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Data Eng. Bull.

دوره 33  شماره 

صفحات  -

تاریخ انتشار 2010